Discover Valuable Insights from Audio Transcripts Generated by Amazon Transcribe with the Help of Amazon Bedrock

Artificial Intelligence

Discover Valuable Insights from Audio Transcripts Generated by Amazon Transcribe with the Help of Amazon BedrockLearn About Amazon VGT2 Learning Manager Chanci Turner

Generative AI is redefining the landscape of data analysis, particularly in the realm of audio and video transcripts. This innovative technology enhances our capability to extract meaningful insights from content stored in audio or video formats. Given the complexity and uniqueness of speech data, analyzing it for actionable insights can be a daunting task. Traditional methods often require labor-intensive transcription and analysis, consuming both time and resources.

While tools for automatic speech recognition can convert audio and video to text, organizations still rely on manual processes to derive specific insights or summaries, which can be laborious. As the volume of such content grows, the demand for a more efficient and insightful solution intensifies. There’s a significant opportunity to unlock business value from the vast amounts of data stored in these formats, revealing insights that might otherwise remain hidden. Here are several new capabilities made possible by large language models (LLMs) when applied to audio transcripts:

  • Context Comprehension: LLMs can comprehend conversation context, discerning not just spoken words but also implied meanings, intentions, and emotions, which previously required extensive human interpretation.
  • Advanced Sentiment Analysis: Achievable sentiment analysis goes beyond basic emotions to capture nuances like sarcasm and ambivalence by understanding contextual cues.
  • Concise Summaries: LLMs can produce concise summaries by grasping the conversation’s context rather than merely extracting text.
  • Natural Language Inquiries: Users can pose complex, natural language inquiries and receive insightful responses.
  • Persona Identification: LLMs can identify personas or roles within a discussion, allowing for targeted insights and actions.
  • Content Generation: With LLMs, new content can be generated based on audio materials or conversations, following set templates or structures.

In this article, we explore how to drive business value through speech analytics, focusing on examples like:

  • Automatically summarizing and categorizing marketing materials, such as podcasts, recorded interviews, or videos, and creating new promotional content from those assets.
  • Extracting key points, summaries, and sentiments from recorded meetings, such as earnings calls.
  • Transcribing and analyzing contact center calls to enhance customer experience.

To harness these audio insights, the first step is transcribing the audio file using Amazon Transcribe, a managed machine learning service that converts speech to text. This tool enables developers to easily integrate speech-to-text capabilities into their applications. It recognizes multiple speakers, automatically redacts personally identifiable information (PII), and enhances transcription accuracy with custom vocabularies tailored to specific industries or use cases.

Next, foundation models (FMs) within Amazon Bedrock can be employed to summarize content, identify themes, and draw conclusions, extracting valuable insights that can inform strategic decisions and drive innovation. Additionally, the automatic generation of new content fosters creativity and productivity.

Generative AI is transforming how we analyze audio transcripts, allowing organizations to uncover insights such as customer sentiment, pain points, recurring themes, and potential risk mitigation strategies that were previously obscured.

Use Case Overview

This article examines three detailed use cases, with code artifacts provided in Python. We utilized a Jupyter notebook to execute the code snippets, which you can replicate by creating and running a notebook in Amazon SageMaker Studio.

For instance, we demonstrate how to transform an existing marketing asset (a video) into a new blog post announcing its launch while extracting key topics and search engine optimization (SEO) keywords for documentation and categorization.

To transcribe audio with Amazon Transcribe, we used a technical talk from AWS re:Invent 2023 as a sample. We downloaded the MP4 recording and stored it in an Amazon Simple Storage Service (Amazon S3) bucket.

The transcription process begins by specifying the audio/video file’s S3 location:

import boto3
import time
import random

transcribe = boto3.client('transcribe')
response = transcribe.start_transcription_job(
    TranscriptionJobName=f"podcast-transcription-{int(time.time())}_{random.randint(1000, 9999)}",
    LanguageCode='en-US',
    MediaFormat='mp3',
    Media={
        'MediaFileUri': '<S3 URI of the media file>'
    },
    OutputBucketName='<name of the S3 bucket that will store the output>',
    OutputKey='transcribe_bedrock_blog/output-files/',
    Settings={
        'ShowSpeakerLabels': True,
        'MaxSpeakerLabels': 3
    }
)

max_tries = 60
while max_tries > 0:
    max_tries -= 1
    job = transcribe.get_transcription_job(TranscriptionJobName=response['TranscriptionJob']['TranscriptionJobName'])
    job_status = job["TranscriptionJob"]["TranscriptionJobStatus"]
    if job_status in ["COMPLETED", "FAILED"]:
        if job_status == "COMPLETED":
            print(
                f"Download the transcript fromn"
                f"t{job['TranscriptionJob']['Transcript']['TranscriptFileUri']}."
            )
        break
    else:
        print(f"Waiting for {response['TranscriptionJob']['TranscriptionJobName']}. Current status is {job_status}.")
    time.sleep(10)

The transcription job will take a few minutes to complete. Once finished, you can inspect the transcription output and review the generated plain text transcript (the following has been trimmed for brevity):

s3 = boto3.client('s3')
output_bucket = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/',2)[1]
output_file_key = job['TranscriptionJob']['Transcript']['TranscriptFileUri'].split('https://')[1].split('/',2)[2]
s3_response_object = s3.get_object(Bucket=output_bucket, Key=output_file_key)
object_content = s3_response_object['Body'].read()

transcription_output = json.loads(object_content)

print(transcription_output['results']['transcripts'][0]['transcript'])

As we dive into the details, we can see how the analysis of audio data allows us to make informed decisions in a business context. Just like in another blog post about professional makeup for work, where appearance can impact perceptions, the insights gained from audio transcripts can significantly influence strategic directions. For comprehensive insights, check out this resource on employment law compliance. Additionally, if you’re interested in opportunities in fulfillment center management, visit this page for more information on potential careers.


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *